Semi-Supervised Learning of Alternative Splicing Events Using Co-Training
نویسندگان
چکیده
Alternative splicing is a phenomenon that gives rise to multiple mRNA transcripts from a single gene. It is believed that a large number of genes undergoes alternative splicing. Predicting alternative splicing events is a problem of great interest to biologists, as it can help them to understand transcript diversity. Supervised machine learning approaches can be used to predict alternative splicing events at genome level. However, supervised approaches require large amounts of labeled data to learn accurate classifiers. While large amounts of genomic data are produced by the new sequencing technologies, labeling these data can be costly and time consuming. Therefore, semi-supervised learning approaches that can make use of large amounts of unlabeled data, in addition to small amounts of labeled data are highly desirable. In this work, we study the usefulness of a semi-supervised learning approach, co-training, for classifying exons as alternatively spliced or constitutive. The co-training algorithm makes use of two views of the data to iteratively learn two classifiers that can inform each other, at each step, with their best predictions on the unlabeled data. We consider three sets of features for constructing views for the problem of predicting alternatively spliced exons: lengths of the exon of interest and its flanking introns, exonic splicing enhancers and intronic regulatory sequences. Naive Bayes and Support Vector Machine (SVM) algorithms are used as based classifiers in our study. Experimental results show that the usage of the unlabeled data can result in better classifiers as compared to those obtained from the small amount of labeled data alone. Keywords-alternative splicing, semi-supervised learning, cotraining.
منابع مشابه
Semi-supervised Learning for Automatic Prosodic Event Detection Using Co-training Algorithm
Most of previous approaches to automatic prosodic event detection are based on supervised learning, relying on the availability of a corpus that is annotated with the prosodic labels of interest in order to train the classification models. However, creating such resources is an expensive and time-consuming task. In this paper, we exploit semi-supervised learning with the co-training algorithm f...
متن کاملGlobal/Local Hybrid Learning of Mixture-of-Experts from Labeled and Unlabeled Data
The mixture-of-experts (ME) models can be useful to solve complicated classification problems in real world. However, in order to train the ME model with not only labeled data but also unlabeled data which are easier to come, a new learning algorithm that considers characteristics of the ME model is required. We proposed global-local co-training (GLCT), the hybrid training method of the ME mode...
متن کاملSemi-supervised Learning of Alternatively Spliced Exons using Expectation Maximization Type Approaches
Successful advances in DNA sequencing technologies have made it possible to obtain tremendous amounts of data fast and inexpensively. As a consequence, the afferent genome annotation has become the bottleneck in our understanding of genes and their functions. Traditionally, data from biological domains have been analyzed using supervised learning techniques. However, given the large amounts of ...
متن کاملA Semi-Supervised Learning Algorithm Based on Modified Self-training SVM
In this paper, we first introduce some facts about semi-supervised learning and its often used methods such as generative mixture models, self-training, co-training and Transductive SVM and so on. Then we present a self-training semi-supervised SVM algorithm based on which we give out a modified algorithm. In order to demonstrate its validity and effectiveness, we carry out some experiments whi...
متن کاملSemi-supervised learning for text classification using feature affinity regularization
Most conventional semi-supervised learning methods attempt to directly include unlabeled data into training objectives. This paper presents an alternative approach that learns feature affinity information from unlabeled data, which is incorporated into the training objective as regularization of a maximum entropy model. The regularization favors models for which correlated features have similar...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011